Get price data

Getting daily price data for any stock is as easy as using quantmod package and getSymbols function.

  • fetches data as xts time series object
  • note src and updated in str output
  • auto.assign:
    • TRUE: creates xts object in environment with name of symbol
    • FALSE: prints results OR can be used to save results to object of any chosen name
getSymbols("CSIQ", auto.assign=TRUE) ## auto.assign=TRUE saves object
## [1] "CSIQ"
csiq_prices <- getSymbols("MSFT", auto.assign=FALSE) ##auto.assign=FALSE to save to specified object
## xts object
str(CSIQ)
## An 'xts' object on 2007-01-03/2023-01-13 containing:
##   Data: num [1:4037, 1:6] 11 10.8 10.5 10.1 10 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "CSIQ.Open" "CSIQ.High" "CSIQ.Low" "CSIQ.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2023-01-14 17:57:45"

Entire history of the daily stock price (at least going back to 2007) is now available in an xts object named after the stock symbol.

  • default source is yahoo finance (src=‘yahoo’)
  • default start date is ‘2007-01-01’ (from=‘2007-01-01’)

Provides daily Open, High, Low, Close, Volume, Adjusted (aka OHLCVA):

## show first rows 
head(CSIQ)
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close CSIQ.Volume CSIQ.Adjusted
## 2007-01-03     10.98     11.00    10.65      10.82      176300         10.82
## 2007-01-04     10.82     10.82    10.39      10.41      164100         10.41
## 2007-01-05     10.50     10.70    10.12      10.15       68600         10.15
## 2007-01-08     10.10     10.34     9.95       9.98      195500          9.98
## 2007-01-09     10.00     10.05     9.77       9.77       85900          9.77
## 2007-01-10      9.69      9.97     9.61       9.68       77800          9.68
## show last rows
tail(CSIQ)
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close CSIQ.Volume CSIQ.Adjusted
## 2023-01-06     32.26     36.21    32.16      36.15     2386400         36.15
## 2023-01-09     36.60     39.42    36.60      38.57     2458700         38.57
## 2023-01-10     38.57     42.18    38.57      42.05     2145300         42.05
## 2023-01-11     41.44     43.77    41.29      42.87     2642100         42.87
## 2023-01-12     43.05     43.38    41.49      43.18     1716300         43.18
## 2023-01-13     42.69     44.07    42.36      43.00     1190700         43.00

“Adjusted” = prices adjusted for stock splits and dividends (possibly other developments? major new issues? buy-backs?)

Incl. TSX

  • Includes stocks on TSX
  • Optionally import into specified object.
  • Specify src from variety of available data sources
## specify object with auto.assign=FALSE
ogi <- getSymbols(Symbols='OGI.TO', auto.assign=FALSE, src='yahoo')

Filter for time period

  • When importing: specific period (start date and end date)
getSymbols("CSIQ", from="2018-01-01", to=Sys.Date(), auto.assign = TRUE)
## [1] "CSIQ"
CSIQ_2018 <- CSIQ

head(CSIQ_2018)
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close CSIQ.Volume CSIQ.Adjusted
## 2018-01-02     16.98     17.16    16.88      17.14      736100         17.14
## 2018-01-03     17.18     17.20    16.83      17.07      987600         17.07
## 2018-01-04     17.13     17.16    16.78      17.06      799100         17.06
## 2018-01-05     17.00     17.19    16.90      17.01      768900         17.01
## 2018-01-08     17.03     17.20    16.85      17.10      727100         17.10
## 2018-01-09     17.08     17.09    16.72      16.95      725500         16.95
## reset full CSIQ for later
getSymbols("CSIQ", auto.assign = TRUE)
## [1] "CSIQ"
  • filter after import
CSIQ_2020 <- CSIQ['2020-01-01/'] ## from specified date fwd
CSIQ_90d <- tail(CSIQ, n=90) ## last n 

## selected period
CSIQ_Aug <- CSIQ['2020-08/2020-08-31'] ## Aug 1-31
  • filter with first and last functions in xts
## data in last 3 months (calendar mths, data so far)
CSIQ_3_mth <- last(CSIQ, "3 months" )
## combine first & last to get: first week from last (mont recent) year
## - includes the days of week 1 in the year
CSIQ_first_week <- first(last(CSIQ, "1 year"), "1 week")

More Time Series Manipulation examples with xts below in separate section.

Get multiple stocks

Each imported as separate object

getSymbols(c("NEP","SPWR"), auto.assign = TRUE)
## [1] "NEP"  "SPWR"
## OR
stocklist <- c("NEP","SPWR","FSLR")
getSymbols(stocklist, aauto.assign = TRUE)
## [1] "NEP"  "SPWR" "FSLR"

Merge multiple stocks

  • use merge function to join cols
  • max 2 data sets at a time, unless use do.call
## TWO AT A TIME
## first two
renewables <- merge(NEP, SPWR, join='outer', fill=NA)
head(renewables)
##            NEP.Open NEP.High NEP.Low NEP.Close NEP.Volume NEP.Adjusted
## 2007-01-03       NA       NA      NA        NA         NA           NA
## 2007-01-04       NA       NA      NA        NA         NA           NA
## 2007-01-05       NA       NA      NA        NA         NA           NA
## 2007-01-08       NA       NA      NA        NA         NA           NA
## 2007-01-09       NA       NA      NA        NA         NA           NA
## 2007-01-10       NA       NA      NA        NA         NA           NA
##            SPWR.Open SPWR.High SPWR.Low SPWR.Close SPWR.Volume SPWR.Adjusted
## 2007-01-03  24.34185  25.14735 23.87034   24.13883     1364069      24.13883
## 2007-01-04  24.16503  24.16503 23.18271   23.85069     1446833      23.85069
## 2007-01-05  23.81794  23.81794 23.22855   23.56254      543765      23.56254
## 2007-01-08  23.76555  25.13425 23.68697   24.73477      872375      24.73477
## 2007-01-09  24.65619  25.49443 24.49902   25.33071     1042177      25.33071
## 2007-01-10  25.27832  27.35429 25.23903   27.01375     2544746      27.01375
## add another one
renewables <- merge(renewables, FSLR, join='outer', fill=NA)
head(renewables)
##            NEP.Open NEP.High NEP.Low NEP.Close NEP.Volume NEP.Adjusted
## 2007-01-03       NA       NA      NA        NA         NA           NA
## 2007-01-04       NA       NA      NA        NA         NA           NA
## 2007-01-05       NA       NA      NA        NA         NA           NA
## 2007-01-08       NA       NA      NA        NA         NA           NA
## 2007-01-09       NA       NA      NA        NA         NA           NA
## 2007-01-10       NA       NA      NA        NA         NA           NA
##            SPWR.Open SPWR.High SPWR.Low SPWR.Close SPWR.Volume SPWR.Adjusted
## 2007-01-03  24.34185  25.14735 23.87034   24.13883     1364069      24.13883
## 2007-01-04  24.16503  24.16503 23.18271   23.85069     1446833      23.85069
## 2007-01-05  23.81794  23.81794 23.22855   23.56254      543765      23.56254
## 2007-01-08  23.76555  25.13425 23.68697   24.73477      872375      24.73477
## 2007-01-09  24.65619  25.49443 24.49902   25.33071     1042177      25.33071
## 2007-01-10  25.27832  27.35429 25.23903   27.01375     2544746      27.01375
##            FSLR.Open FSLR.High FSLR.Low FSLR.Close FSLR.Volume FSLR.Adjusted
## 2007-01-03     30.17     30.29    28.20      28.50      487000         28.50
## 2007-01-04     28.11     28.84    28.00      28.75      325500         28.75
## 2007-01-05     28.65     28.65    27.73      28.15      524300         28.15
## 2007-01-08     28.26     29.00    28.26      28.90      317000         28.90
## 2007-01-09     29.00     29.77    28.95      29.27      232800         29.27
## 2007-01-10     29.22     29.67    29.09      29.16      267700         29.16
## MULTIPLE (2+) AT A TIME
## set list of symbols
stockrenew <- c("BEP","TERP","VSLR")
## set up a new environment to hold the imported data
data_env <- new.env()
## get symbols
getSymbols(Symbols=stockrenew, env=data_env, auto.assign=TRUE)
## [1] "BEP"
## create list of objects within the environment
data_list <- lapply(data_env, Cl) ## apply function: Cl=close; OHLCV (not sure how to get all)
renewables2 <- do.call(merge, data_list)
head(renewables2)
##            BEP.Close
## 2007-01-03  8.686384
## 2007-01-04  8.643752
## 2007-01-05  8.643752
## 2007-01-08  8.515854
## 2007-01-09  8.553157
## 2007-01-10  8.553157

Multiple - most efficient

Most efficient code for getting multiple stocks at once.

  • use cbind to combine cols in separaet object as you go
## specify stock symbols
stockrenew3 <- c("ENPH","SEDG","RUN","JKS")
## set empty object
renewables3 <- NULL
## loop through each stock, binding cols as you go
for(stocks in stockrenew3){
  renewables3 <- cbind(renewables3,
                      getSymbols(Symbols=stocks, 
                                 from='2018-01-01', periodicity='daily', 
                                 auto.assign = FALSE))
}
head(renewables3)
##            ENPH.Open ENPH.High ENPH.Low ENPH.Close ENPH.Volume ENPH.Adjusted
## 2018-01-02      2.39      2.66     2.38       2.60      852700          2.60
## 2018-01-03      2.62      2.64     2.55       2.58      434100          2.58
## 2018-01-04      2.62      2.62     2.43       2.50      547500          2.50
## 2018-01-05      2.52      2.53     2.42       2.47      512100          2.47
## 2018-01-08      2.46      2.49     2.34       2.41      953000          2.41
## 2018-01-09      2.40      2.45     2.35       2.38      948000          2.38
##            SEDG.Open SEDG.High SEDG.Low SEDG.Close SEDG.Volume SEDG.Adjusted
## 2018-01-02     37.65    38.200    37.35      38.05      407300         38.05
## 2018-01-03     38.25    39.150    37.85      39.05      908400         39.05
## 2018-01-04     39.55    39.650    36.45      37.35     1013600         37.35
## 2018-01-05     37.35    38.375    37.15      37.90      553100         37.90
## 2018-01-08     38.00    38.523    37.75      38.35      487500         38.35
## 2018-01-09     38.35    38.500    36.65      37.20      666700         37.20
##            RUN.Open RUN.High RUN.Low RUN.Close RUN.Volume RUN.Adjusted JKS.Open
## 2018-01-02     5.88     5.98    5.88      5.92     672400         5.92    24.12
## 2018-01-03     5.92     6.14    5.92      6.05     933100         6.05    25.21
## 2018-01-04     6.05     6.09    5.91      6.06     798500         6.06    24.21
## 2018-01-05     6.07     6.09    5.94      6.01    1062000         6.01    24.30
## 2018-01-08     6.00     6.10    5.92      6.05     466100         6.05    24.25
## 2018-01-09     6.03     6.08    5.96      6.01     564000         6.01    23.70
##            JKS.High JKS.Low JKS.Close JKS.Volume JKS.Adjusted
## 2018-01-02    25.44   24.12     25.06     396600        25.06
## 2018-01-03    25.21   24.08     24.33     534000        24.33
## 2018-01-04    24.42   24.00     24.19     422800        24.19
## 2018-01-05    24.96   24.07     24.23     328500        24.23
## 2018-01-08    24.50   23.96     24.13     358300        24.13
## 2018-01-09    24.54   23.70     24.00     798600        24.00
  • CLOSE only: specify [,4] at end of getSymbols
## get CLOSE only: same as above but SPECIFY [,4] at end
renewables3c <- NULL
for(stocks in stockrenew3){
  renewables3c <- cbind(renewables3c,
                      getSymbols(Symbols=stocks, 
                                 from='2018-01-01', periodicity='daily', 
                                 auto.assign = FALSE)[,4])
}
head(renewables3c)
##            ENPH.Close SEDG.Close RUN.Close JKS.Close
## 2018-01-02       2.60      38.05      5.92     25.06
## 2018-01-03       2.58      39.05      6.05     24.33
## 2018-01-04       2.50      37.35      6.06     24.19
## 2018-01-05       2.47      37.90      6.01     24.23
## 2018-01-08       2.41      38.35      6.05     24.13
## 2018-01-09       2.38      37.20      6.01     24.00

Data frame conversion

## convert xts to data frame - will work with dygraphs
csiq_df <- data.frame(csiq_prices)
## row names will be dates - move to date col
csiq_df$date <- ymd(rownames(csiq_df))
## [opt] set rownames to number as in typical data frame (won't work with dygraphs)
rownames(csiq_df) <- 1:nrow(csiq_df)
## [opt] move date col to first left
csiq_df <- csiq_df %>% select(date,1:(ncol(csiq_df)-1))

Visualize

quantmod has its own ‘chartSeries’ / ‘chart_Series’, Dygraphs pkg enables interactive charts designed to work with quantmod right out of the box.

Several options:

  • chartSeries: basic charting from Quantmod pkg
    • works with xts
    • understands .Open, .High, .Low, .Close
    • basic version has vol
    • easy to add technical layers
  • chart_Series: experimental version of basic charting from Quantmod pkg, seems like similar features, different default formatting
  • dygraphs: more interactivity, works with quantmod, xts
    • works with data frame as long as rownames are dates
    • doesn’t have the technical analysis features (afaik); limited tweaking of format - mostly automatic
  • ggplot: if data converted from xts to data frame (example above)

chartSeries: basic line charts (non-interactive)

Candlestick

  • chartSeries, chart_Series: single stock, OHLC candlestick by default
## basic chart_Series from quantmod
## - two versions chartSeries (orig) and chart_Series (new experimental)
chartSeries(CSIQ_90d, name="CSIQ - OHLC & vol")

chart_Series(CSIQ_90d, name="CSIQ - OHLC") 

## same:
#chart_Series(OHLC(CSIQ_90d))
  • Open is either bottom of green bar OR top of red bar
  • High is highest point (either line or top of bar)
  • Low is lowest point (either line or bottom of bar)
  • Close is either top of green bar OR bottom of red bar

Technical features

  • SMA: simple moving average
  • MACD: moving average convergence divergence signals
  • BBands: Bollinger bands
  • other features are available in quantmod (not shown)
chartSeries(CSIQ_90d, name="CSIQ - OHLC & Technical Indicators", TA=c(addVo(),addBBands(), addMACD()))

  • add more technical with chart_Series - layer in one chart at a time
chart_Series(OHLC(CSIQ), subset='2019-12/2020-07')

add_SMA(n=30, col='blue')

add_SMA(n=7, col='green')

add_MACD() 

add_BBands()

addMomentum()

  • same idea all-in-one (leaving out momentum - doesn’t work with chart_Series)
## semi-colon separated TA functions
chart_Series(OHLC(CSIQ), subset='2019-12/2020-07', main="Full Tech Indicators",
             TA="add_SMA(n=30, col='blue');
             add_SMA(n=7, col='green');
             add_MACD();
             add_BBands()")

Specific cols

  • can isolate specific columns
## two variations
chartSeries(Cl(CSIQ_90d), name="CSIQ - Closing Prices")

## new charting
chart_Series(Cl(CSIQ_90d), name="CSIQ - Closing Prices")

Multiple stocks

  • multiple stocks, one chart (but how to identify each stock?)
chart_Series(Cl(renewables3c))

chartSeries(renewables, theme=chartTheme("white"))

Dygraph line/candle (interactive)

Lines / Candlestick

## use dygraphs pkg
dygraph(OHLC(CSIQ_90d))
cndl <- dygraph(OHLC(CSIQ_90d))
dyCandlestick(cndl)

Single col

  • closing price only
# chart_Series(Cl(CSIQ)) #chart_Series seems to sometimes cause probs with pandoc(?)
dygraph(Cl(CSIQ))

Annotations!

Easily add annotations

## shorter date range for readability / relevance
CSIQ_covid <- CSIQ_2020['/2021-05-30']
## find date of peak close for annotation
#max(CSIQ_covid$CSIQ.Close)
peak_date <- index(CSIQ_covid[Cl(CSIQ_covid)==max(Cl(CSIQ_covid))])

## multi-step process
## setup chart parameters
#graph <- dygraph(OHLC(CSIQ_covid), main="Cdn Solar")
## compile/add event annotations
#graph <- dyEvent(graph, c("2020-02-21",as.character(peak_date)),
#                          c("Start of covid crash","Post-covid peak"),
#                 labelLoc="top")
## print candlestick
#dyCandlestick(graph)

## pipe together chart info and event annotations individually for more control
dygraph(OHLC(CSIQ_covid), main="Cdn Solar with annotation") %>%
  dyEvent("2020-02-21","Start of covid crash", labelLoc="top") %>%
  dyEvent(as.character(peak_date),"Post-covid peak", labelLoc="bottom") %>%
  dyCandlestick()

Show only adjusted close, filtered for 2020 - early 2021

CSIQ_2020 <- CSIQ['2020-01-01/2021-01-31']
graph <- dygraph(Ad(CSIQ_2020), main="Cdn Solar")
graph <- dyEvent(graph, "2020-02-21","Start of covid crash",
                 labelLoc="top")
graph <- dyEvent(graph, "2021-01-21","Peak after covid crash",
                 labelLoc="bottom")
graph
  • just under 1 yr from crash to peak

Adjusted prices

Adjusted price is supposed to adjust for stock splits, while other prices are actual on the trading day.

However…Tesla (TSLA) had a 5-for-1 stock split on Aug 3 dropping price from ~$2,000 to ~$400. Prices collected as of Sep 5, 2020 show post-split throughout.

getSymbols("TSLA", auto.assign = TRUE)
## [1] "TSLA"
## filter for Aug, pre-split
TSLA_presplit <- TSLA['2020-08/2020-08-30']
tail(TSLA_presplit)
##            TSLA.Open TSLA.High TSLA.Low TSLA.Close TSLA.Volume TSLA.Adjusted
## 2020-08-21  136.3173  139.6993 135.0033   136.6653   322344000      136.6653
## 2020-08-24  141.7520  141.9333 128.5013   134.2800   300954000      134.2800
## 2020-08-25  131.6593  135.1967 131.2000   134.8893   159883500      134.8893
## 2020-08-26  137.3333  144.4000 136.9087   143.5447   213591000      143.5447
## 2020-08-27  145.3640  153.0400 142.8333   149.2500   355395000      149.2500
## 2020-08-28  153.0080  154.5660 145.7680   147.5600   301218000      147.5600
#dygraph(OHLC(TSLA))
## set width and height in pixels
dygraph(Ad(TSLA), width=400, height=300)
dygraph(TSLA[,c(4,6)], width=400, height=300)

Get RETURNS

  • Prices are great, but what REALLY counts is RETURNS!
  • Quantmod has functions for different returns by periodicity (daily, weekly, monthly, quarterly, annual)
  • periodReturn is the underlying function
  • can use allReturns to calculate all return periods at once, depending on date range.
  • PerformanceAnalytics pkg is the definitive R pkg for returns and portfolio optimization - so this is quick coverage and will dive more into PerformanceAnalytics in another file.
CSIQ_90d_return <- dailyReturn(CSIQ_90d)
head(CSIQ_90d_return)
##            daily.returns
## 2022-09-07   0.059738434
## 2022-09-08  -0.069134230
## 2022-09-09  -0.016910376
## 2022-09-12  -0.035564807
## 2022-09-13   0.016148421
## 2022-09-14   0.006166935
dygraph(CSIQ_90d_return, width=600, height=250)

All Returns data

  • one command to calculate returns for all periodicities - assuming date range is wide enough.
CSIQ_returns <- allReturns(CSIQ)
head(CSIQ_returns)
##                   daily      weekly monthly quarterly yearly
## 2007-01-03           NA          NA      NA        NA     NA
## 2007-01-04 -0.037892791          NA      NA        NA     NA
## 2007-01-05 -0.024975985 -0.07559199      NA        NA     NA
## 2007-01-08 -0.016748768          NA      NA        NA     NA
## 2007-01-09 -0.021042084          NA      NA        NA     NA
## 2007-01-10 -0.009211873          NA      NA        NA     NA

Annual returns

  • zoom in on annual returns.
## subset for annual return data and chart
CSIQ_returns[!is.na(CSIQ_returns$yearly),5] %>%
dygraph(main="CSIQ Annual Returns")
head(CSIQ_returns[!is.na(CSIQ_returns$yearly),5])
##                yearly
## 2007-12-31  1.5637523
## 2008-12-31 -0.7705151
## 2009-12-31  3.4613003
## 2010-12-31 -0.5700902
## 2011-12-30 -0.7853107
## 2012-12-31  0.2781955
  • would be nicer to have bar chart
  • how about quick ggplot after conversion to data frame?
csiq_ret_df <- data.frame(CSIQ_returns[!is.na(CSIQ_returns$yearly),5])
csiq_ret_df$date <- ymd(rownames(csiq_ret_df))
ggplot(csiq_ret_df, aes(x=as.factor(date), y=yearly))+geom_col()+
  scale_y_continuous(labels=percent_format())+
  labs(title="Yearly Returns", x="", y="annual % returns")+
  theme(axis.text.x = element_text(angle=45, hjust=1))

## or simplified x-axis
csiq_ret_df$year <- year(csiq_ret_df$date)
csiq_ret_df %>% ggplot(
  aes(x=year, y=yearly))+geom_col()+
  scale_y_continuous(labels=percent_format())+
  labs(title="Yearly Returns", x="", y="annual % returns")

Stats on returns

summary(CSIQ_90d_return)
##      Index            daily.returns       
##  Min.   :2022-09-07   Min.   :-0.0691342  
##  1st Qu.:2022-10-07   1st Qu.:-0.0250181  
##  Median :2022-11-08   Median :-0.0063732  
##  Mean   :2022-11-09   Mean   : 0.0003253  
##  3rd Qu.:2022-12-11   3rd Qu.: 0.0189893  
##  Max.   :2023-01-13   Max.   : 0.1378660
sum(CSIQ_90d_return)
## [1] 0.0292809
ggplot(CSIQ_90d_return, aes(x=CSIQ_90d_return))+geom_histogram()+
  geom_vline(xintercept = median(CSIQ_90d_return))+
  geom_vline(xintercept = quantile(CSIQ_90d_return, 0.2))+
  geom_vline(xintercept = quantile(CSIQ_90d_return, 0.8))+
  scale_y_continuous(expand=c(0,0))+
  scale_x_continuous(labels=percent)+
  labs(title="Daily returns over the period", 
       subtitle='Lines: 20th percentile, Median, 80th percentile',
       x='Daily % chg')

Time Series Manipulation

Leverage xts to work with time series objects (probably warrants its own file)

End points by period

  • day, week, month, qtr, yr.
  • returns the locations of each last item in the specified type of period, aligned on calendar.
  • will include the last item available in the last period (last data point).
  • returns a vector starting with 0 and ending with the extent (last row) of your data.
  • can also use k to skip: on=‘years’, k=4 returns location of every 4th yr in data (plus last)
## get index number of last day of each year
index_yr <- endpoints(CSIQ, on="years")
## apply to time series to extract selected end points
CSIQ_yr_end <- CSIQ[index_yr]
head(CSIQ_yr_end[,c(2,3,5)]) ## selected cols for space
##            CSIQ.High CSIQ.Low CSIQ.Volume
## 2007-12-31     29.15    27.26     1461900
## 2008-12-31      6.95     6.37     1024900
## 2009-12-31     29.44    28.81      686600
## 2010-12-31     12.41    12.12      510700
## 2011-12-30      2.70     2.53      350400
## 2012-12-31      3.43     3.25      506200
index_week_end <- endpoints(CSIQ_Aug, on="weeks")
CSIQ_Aug[index_week_end, 1:4] ## open, high, low, close selected
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close
## 2020-08-07     25.00     26.72    24.62      25.32
## 2020-08-14     26.41     26.41    25.51      25.89
## 2020-08-21     25.59     26.65    25.59      25.96
## 2020-08-28     29.25     32.72    28.92      32.52
## 2020-08-31     32.45     34.10    31.78      32.49

Summarization

##
period.apply(CSIQ_Aug, INDEX=index_week_end, FUN=mean)
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close CSIQ.Volume CSIQ.Adjusted
## 2020-08-07    23.526    24.352   23.142     23.822     1688280        23.822
## 2020-08-14    25.044    25.778   24.274     25.160     1739680        25.160
## 2020-08-21    26.142    26.854   25.756     26.176     1371340        26.176
## 2020-08-28    27.702    29.188   27.424     28.798     2473340        28.798
## 2020-08-31    32.450    34.100   31.780     32.490     2033100        32.490

Shortcut summarization

## apply.weekly, apply.monthly, apply.quarterly, apply.yearly to summarize
## FUN=<whatever summarization function for the period)
apply.weekly(CSIQ_Aug, FUN=mean)
##            CSIQ.Open CSIQ.High CSIQ.Low CSIQ.Close CSIQ.Volume CSIQ.Adjusted
## 2020-08-07    23.526    24.352   23.142     23.822     1688280        23.822
## 2020-08-14    25.044    25.778   24.274     25.160     1739680        25.160
## 2020-08-21    26.142    26.854   25.756     26.176     1371340        26.176
## 2020-08-28    27.702    29.188   27.424     28.798     2473340        28.798
## 2020-08-31    32.450    34.100   31.780     32.490     2033100        32.490

Split into periods

## split data into weeks - chunk the data into 'week' components, without summarizing
CSIQ_Aug_weeks <- split(CSIQ_Aug, f='weeks')
## second week: high, low, close selected
CSIQ_Aug_weeks[[2]][,2:4]
##            CSIQ.High CSIQ.Low CSIQ.Close
## 2020-08-10     25.48    23.94      24.46
## 2020-08-11     24.72    23.32      24.04
## 2020-08-12     25.73    24.00      25.03
## 2020-08-13     26.55    24.60      26.38
## 2020-08-14     26.41    25.51      25.89

Aggregate time periods

  • Aggregate data by period
  • Can take a single variable and calculate OHLC statistics by period
CSIQ_yearly <- to.period(CSIQ[,6], period='years', name='CSIQ_adj', OHLC=TRUE)
CSIQ_yearly
##            CSIQ_adj.Open CSIQ_adj.High CSIQ_adj.Low CSIQ_adj.Close
## 2007-12-31         10.82         30.61         7.14          28.15
## 2008-12-31         28.67         51.00         3.78           6.46
## 2009-12-31          6.49         30.15         3.00          28.82
## 2010-12-31         30.99         32.71         9.20          12.39
## 2011-12-30         12.49         16.38         2.20           2.66
## 2012-12-31          2.80          4.39         2.03           3.40
## 2013-12-31          3.40         32.00         3.15          29.82
## 2014-12-31         34.19         43.67        21.11          24.19
## 2015-12-31         24.69         39.09        15.72          28.96
## 2016-12-30         27.98         27.98        10.65          12.18
## 2017-12-29         12.67         18.87        11.15          16.86
## 2018-12-31         17.14         17.75        11.87          14.34
## 2019-12-31         14.44         24.99        14.44          22.10
## 2020-12-31         22.52         53.16        13.24          51.24
## 2021-12-31         52.64         64.15        30.18          31.29
## 2022-12-30         31.33         47.01        22.54          30.90
## 2023-01-13         30.30         43.18        30.30          43.00

Shortcut aggregation

CSIQ_2020_mth_agg <- to.monthly(CSIQ_2020[,6], name='CSIQ_adj')
CSIQ_2020_mth_agg
##          CSIQ_adj.Open CSIQ_adj.High CSIQ_adj.Low CSIQ_adj.Close
## Jan 2020         22.52         23.45        20.05          20.05
## Feb 2020         20.62         24.64        20.40          20.40
## Mar 2020         20.61         21.50        13.24          15.91
## Apr 2020         14.74         18.58        14.74          17.84
## May 2020         17.14         19.28        16.81          18.80
## Jun 2020         18.71         20.20        16.95          19.27
## Jul 2020         19.33         24.88        19.33          22.14
## Aug 2020         22.42         32.52        22.42          32.49
## Sep 2020         33.75         35.10        27.68          35.10
## Oct 2020         36.35         44.12        35.90          36.35
## Nov 2020         36.96         44.17        35.70          42.76
## Dec 2020         41.98         53.16        37.53          51.24
## Jan 2021         52.64         64.15        50.52          54.75

Merging data with difference frequencies

## get data -> DFF is daily Fed rate, FEDFUNDS is monthly
getSymbols(Symbols=c("FEDFUNDS","DFF"), src="FRED" )
## [1] "FEDFUNDS" "DFF"
head(DFF) ## daily data 
##             DFF
## 1954-07-01 1.13
## 1954-07-02 1.25
## 1954-07-03 1.25
## 1954-07-04 1.25
## 1954-07-05 0.88
## 1954-07-06 0.25
head(FEDFUNDS) ## mthly data
##            FEDFUNDS
## 1954-07-01     0.80
## 1954-08-01     1.22
## 1954-09-01     1.07
## 1954-10-01     0.85
## 1954-11-01     0.83
## 1954-12-01     1.28
## aggregate DFF to monthly by average
DFF_monthly <- apply.monthly(DFF, FUN=mean) ## FUN=aggregation calc
head(DFF_monthly) ## date as of end of month
##                  DFF
## 1954-07-31 0.7993548
## 1954-08-31 1.2206452
## 1954-09-30 1.0666667
## 1954-10-31 0.8487097
## 1954-11-30 0.8336667
## 1954-12-31 1.2777419
## convert to monthly index
index(DFF_monthly) <- as.yearmon(index(DFF_monthly))

## mow can merge with data sets that have first of month
merged_funds <- merge(FEDFUNDS, DFF_monthly)
head(merged_funds)
##            FEDFUNDS       DFF
## 1954-07-01     0.80 0.7993548
## 1954-08-01     1.22 1.2206452
## 1954-09-01     1.07 1.0666667
## 1954-10-01     0.85 0.8487097
## 1954-11-01     0.83 0.8336667
## 1954-12-01     1.28 1.2777419

Quandl for Importing

Claims ’the world’s most powerful data lives on Quandl. See also: Quandl API documentation.

  • Economic data and similar
  • Specify Source/Symbol
  • Several ‘types’ available: deflaut is data.frame
  • Does not appear that you can access stock prices without account/payment :(
## as of Jan 14, 2023 getting error msg that I need to get API key
#us_gdp <- Quandl(code='FRED/GDP')

#ggplot(us_gdp, aes(x=Date, y=Value))+geom_line()

PerformanceAnalytics [next]

PerformanceAnalytics pkg is considered best for analysis of returns and portfolio decisions. Will dive into it in separate file.